10. Dummy Variables

If you are interested in adding a categorical variable to a regression model that tells us the colors of a street light: red, yellow, and green, how many dummy variable columns will be added to your linear regression model?


If I have a categorical variable with two levels yes or no, how many dummy variables would needed to be added to a linear model to use this variable?


Imagine you own a restaurant, and you have a ratings scale of: 'great', 'good', 'okay', 'poor', or 'awful'. You would like to understand the tip given based on this rating, so you build a linear model, using dummy variables to represent the ratings. How many total coefficients are in your model?


Which of the below are true regarding the dummy variables we add to our multiple linear regression models? Let X be the X matrix as defined in the previous Screencast. Mark all that are true.

  • There should always be as many dummy variables added to your X matrix as the number of levels of each categorical variable minus 1.
  • The reason for dropping a dummy variable is to assure that all of our columns are linearly independent.
  • The reason for dropping a dummy variable is to assure that the dot product of X'X is invertible.
  • The reason for dropping a dummy variable is to assure that your X matrix is full rank.